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C^l ■ Abstract 

Q ■ A novel integer value-sorting technique is proposed replacing bucket sort, distri- 

Iz:! 

^ bution counting sort and address calculation sort family of algorithms. It requires 
only constant amount of additional memory. The technique is inspired from one 

C/^ ■ of the ordinal theories of "serial order in behavior" and explained by the analogy 

Q 

^ ■ with the three main stages in the formation and retrieval of memory in cognitive 

^ " neuroscience namely (i) practicing, (ii) storing and (iii) retrieval. 

■ Although not suitable for integer rank-sorting where the problem is to put an 

>: 

CN . array of elements into ascending or descending order by their numeric keys, each 

lO ■ of which is an integer, the technique seems to be efficient and applicable to rank- 

o: 

. sorting, as well as other problems such as hashing, searching, element distinction, 

^ I succinct data structures, gaining space, etc. 



X : 1 Introduction 

An integer value-sorting algorithm puts an array of integers into ascending or descending 
order by their values, whereas a rank-sorting algorithm puts an array of elements (satellite 
information) into ascending or descending order by their numeric keys, each of which is an 
integer. It is possible that a rank-sorting algorithm can be used in place of a value-sorting 
algorithm, since if each element of the array to be sorted is itself an integer and used as 
the key, then rank-sorting degenerates to value-sorting, but the converse is not always 
true. 

The technique described in this study is suitable for arrays of integers where the inte- 
gers are laid out in contiguous locations of the memory. Zero-based indexing is considered 
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while accessing the integers, e.g., A[0] and A[n — 1] are the first and last integers of the 
array, respectively, where n is the number of integers of the array. 

Nervous system is considered to be closely related and described with the "serial order 
in behavior" in cognitive neuroscience [HEj with three basic theories which cover almost 
all abstract data types used in computer science. These are chaining theory, positional 
theory and ordinal theory [3]. 

Chaining theory is the extension of stimulus-response (reflex chain) theory, where 
each response can become the stimulus for the next [3]. From an information processing 
perspective, comparison based sorting algorithms that sort the arrays by making a series 
of decisions relying on comparing keys can be classified under chaining theory. Each 
comparison becomes the stimulus for the next. Hence, keys themselves are associated 
with each other. Some important examples are quick sort [1], shell sort [5], merge sort [6] 
and heap sort [7]. 

Positional theory assumes order is stored by associating each element with its position 
in the sequence. The order is retrieved by using each position to cue its associated element. 
This is the method by which conventional (Von Neumann) computers store and retrieve 
order, through routines accessing separate addresses in memory [3]. Content-based sorting 
algorithms where decisions rely on the contents of the keys can be classified under this 
theory. Each key is associated with a position depending on its content. Some important 
examples are distribution counting sort [SlIS], address calculation sort pilHT^ . bucket 
sort [16l[l7j and radix sort [T6] - [T9] . 

Ordinal theory assumes order is stored along a single dimension, where that order is 
defined by relative rather than absolute values on that dimension. Order can be retrieved 
by moving along the dimension in one or the other direction. This theory need not assume 
either the item-item nor position-item associations of the previous theories [3]. 

One of the ordinal theories of serial order in behavior is that of Shiffrin and Cook 
PO] which suggests a model for short-term forgetting of item and order information of 
the brain. It assumes associations between elements and a "node", but only the nodes 
are associated with one another. By moving inwards from nodes representing the start 
and end of the sequence, the associations between nodes allow the order of items to be 
reconstructed [5]. 

The technique presented in this study is inspired from the ordinal model of Shiffrin and 
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Cook. As in the ordinal model of Shiffrin and Cook, it is assumed that the associations 
are between the integers in the array space and the nodes in an imaginary linear subspace 
(ILS) that spans a predefined range of integers. The range of the integers spanned by the 
ILS is upper bounded by the number of integers n but may be smaller and can be located 
anywhere provided that its boundaries do not cross over that of the array. This makes 
the technique in-place, i.e., beside the input array, only a constant amount of memory 
locations are used for storing counters and indices. An association between an integer in 
the array space and the ILS is created by a node using a monotone bijective hash function 
that maps the integers in the predefined interval to the ILS. When a particular distinct 
integer is mapped to the ILS, a node is created reserving all the bits of the integer except 
for the most significant bit (MSB) which is used to tag the word as a node of the ILS 
for interrogation purposes. The reserved bits become the record of the node which then 
be used to count (practice) other occurrences of that particular integer that created the 
node. When all the key of the predefined interval are practiced, the nodes can be stored 
at the beginning of the array (short-term memory) retaining their relative order together 
with the information (cue) required to construct the sorted permutation of the practiced 
interval. Afterwards, the short-term memory is processed and the sorted permutation of 
the practiced interval is retrieved over the array space in linear time using only constant 
amount of additional memory. 

The adjective "associative" derived from two facts where the first one will be realized 
with the description of the technique in the following paragraphs. The second one is 
that, although it replaces all derivatives of the content based sorting algorithms such as 
distribution counting sort [Smj , address calculation sort [TUHT5] and bucket sort [TB|[T7] on 
a RAM, it seems to be more efficient on a "content addressable memory" (CAM) known as 
"associative memory" which in one word time find a matching segment in tag portion of 
the word and reaches the remainder of the word [21]. In the current version of associative 
sort developed on a RAM, the nodes of the imaginary linear subspace (tagged words) and 
the integers of the array space (untagged words) are processed sequentially which will be 
a matter of one word time for a CAM to retrieve previous or next tagged or untagged 
word. 
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1.1 Definitions 

Given an array A oin integers, ^[0]? ^[1], • • • , ^[?^— 1], the problem is to sort the integers 
in ascending or descending order. The notations used throughout the study are: 

(i) Universe of integers is assumed U = [0 ... 2^" — 1] where w is the fixed word length. 

(ii) Maximum and minimum integers of an array are, max(74) = max(a|a G A) and 
min(y4) = min(a|a G A), respectively. Hence, range of the integers is, m = max(y4) — 
min(/l) + 1. 

(iii) The notation B G A is used to indicated that 5 is a proper subset of A. 

(iv) For two arrays Ai and A2, max(Ai) < min(A2) implies Ai < A^. 

2 In-place Associative Integer Sorting 

The main difficulties of all distributive sorting algorithms is that, when the keys are 
distributed using a hash function according to their content, several of them may be 
clustered around a loci, and several may be mapped to the same location. These problems 
are solved by inherent three basic steps of associative sort (i) 'practicing, (ii) storing and 
(iii) retrieval which are the three main stages in the formation and retrieval of memory 
in cognitive neuroscience: 

(i) Encoding or registration: receiving, processing and combining of received informa- 
tion. 

(ii) Storage: creation of a permanent record of the encoded information. 

(iii) Retrieval, recall or recollection: calling back the stored information in response to 
some cue for use in a process or activity. 

2.1 Practicing 

An association between an integer in the array space and the ILS is created by a node 
using a monotone bijective hash function that maps the integers in the predefined interval 
to the ILS. The process of creating a node by mapping a distinct integer to the ILS is 
"practicing a distinct integer of an interval". Since ILS is defined on the array space, 
mapping a distinct integer to the ILS is just an exchange operation. Once a node is 
created, the redundancy due to the association between the integer and the position of 
the node (the position where the integer is mapped) releases the word allocated to the 
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integer in the physical memory except for one bit which tags the word as a node for 
interrogation purposes. The tag bit discriminates the word as node and the position of 
the node lets the integer be retrieved back from the ILS using the inverse hash function. 
This is "integer retrieval". All the bits of the node except the tag bit can be cleared 
and used to store any information. Hence, they are the "record" of the node and the 
information stored in the record is the "cue" by which cognitive neuro- scientists describe 
the way that the brain recalls the successive items in an order during retrieval. For 
instance, it will be foreknown from the tag bit that a node has already been created while 
another occurrence of that particular integer is being practiced providing the opportunity 
to count other occurrences. The process of counting other occurrences of a particular 
integer is "practicing all the integers of an interval", i.e., rehearsing used by cognitive 
neuro-scientists to describe the way the brain manipulates the sequence before storing in 
a short (or long) term memory. 

Practicing does not need to alter the value of other occurrences. Only the first occur- 
rence is altered while being practiced from where a node is created. All other occurrences 
of that particular integer remain in the array space but become meaningless. Hence they 
are "idle integers" . On the other hand, practicing does not need to alter the position of 
idle integers as well, unless another distinct integer creates a node exactly at the position 
of an idle integer while being practiced. In such a case, the idle integer is moved to the 
former position of the integer that creates the new node. This makes associative sort 
unstable, i.e., equal integers may not retain their original relative order. However, an 
ILS can create other subspaces and associations using the idle integers that were already 
practiced by manipulating either their position or value or both. Hence, a part of lin- 
ear algebra and related fields of mathematics can be applied on subspaces to solve such 
problems. 

Universe of Integers. When an integer is first practiced, a node is created releasing w 
bits of the integer free. One bit is used to tag the word as a node. Hence, it is reasonable 
to doubt that the tag bit limits the universe of integers because all the integers should be 
untagged and in the range [0, 2^""^ — 1] before being practiced. But, we can, 

(i) partition A into 2 disjoint sublists Ai < 2"*"^ < A2 in 0{n) time with well known 
in- place partitioning algorithms as well in a stable manner with [21] . 

(ii) shift all the integers of A2 by —2"'"^, sort Ai and A2 associatively and shift A2 by 

5 



There are other methods to overcome this problem. For instance, 

(i) sort the subhst A[0 . . . (n/ \ogn) — 1] using the optimal in-place merge sort [25] . 

(ii) compress A[0 . . . (n/ \ogn) — 1] by Lemma 1 of [23] generating f2(n) free bits, 

(iii) sort A[{n/ logra) . . .n — 1] associatively using VL{n) free bits as tag bits, 

(iv) uncompress y4[0 . . .{n/ log n) — 1] and merge the two sorted sublists in-place in linear 
time by [25] . 

Number of Integers. If practicing a distinct integer lets us to use its — 1 bits to 
practice other occurrences of that particular integer, we have w — 1 free bits by which we 
can count up to 2"*"^ occurrences including the first integer that created the node. Hence, 
it is reasonable to doubt again that there is another restriction on the size of the arrays, 
i.e., n < 2^~^ under the assumption that an integer may always occur more than 2^"^ 
times for an array of > 2^^^. But an array can be divided into two parts in 0{1) time 
and those parts can be merged in-place in linear time by [26] after sorted associatively. 

Hence, for the sake of simplicity, it will be assumed that n < 2^^^ and all the integers 
are in the range [0, 2"""^ — 1] throughout the study. 

2.2 Storage 

Once all the integers in the predefined interval are practiced, the nodes dispersed in the ILS 
are clustered in a systematic way closing the distance between them to a direction retaining 
their relative order. This is the storage phase of associative sort where the received, 
processed and combined information required to construct the sorted permutation of 
the practiced interval is stored in the short-term memory (e.g., beginning of the array). 
When the nodes are moved towards a direction, it is not possible to retain the association 
between the ILS and array space. However, the record of a node can be further used to 
encode the absolute (former) position of that node as well, or maybe the relative position 
(with respect to the ILS) or how much that node is moved relative to its absolute or 
relative position during storing. Unfortunately, this requires that a record is enough to 
store both the positional information and the number of idle integers practiced by that 
node. However, as explained earlier, further associations can be created using the idle 
integers that were already practiced by manipulating either their position or value or 
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both. Hence, if the record is enough, it can store both the positional information and the 
number of idle integers. If not, an idle integer can be associated accompanying the node to 
supply additional space to store the positional information. This definition immediately 
reminds one of the main difficulties of distributive sorting algorithms. When the keys 
are distributed using a hash function according to their content, several of them may be 
clustered around a loci. Hence, it is reasonable to think the difficulty in associating an 
idle integer accompanying the node. However, as explained earlier, the ILS can be defined 
anywhere on the array space and the range of the integers spanned by the ILS is upper 
bounded by n but may be smaller and can be located anywhere making the technique 
in-place. 

2.3 Retrieval 

Finally, the sorted permutation of the practiced interval is constructed in the array space, 
using the information stored in the short-term memory. This is the retrieval phase of 
associative sort which depends on the information stored in the record of a particular 
node. If the record is enough, it stores both the position of the node and the number of 
practiced idle integers. If not, an associated idle integer accompanying the node stores 
the position of the node while the record holds the number of practiced idle integers. The 
positional information cues the recall of the integer using the inverse hash function. This 
is "integer retrieval" from imaginary subpace. Hence, the retrieved (recalled) integer can 
be copied on the array space as many as it occurrs. It should be noticed that one can 
process the information in the short-term memory from right to left and distinguish an idle 
key (untagged word) from a node (tagged word). From right to left, an (untagged) idle 
key implies that it is accompanying the preceding (tagged) node for additional storage. 

Hence, moving through nodes that represent the start and end of practiced integers 
as well as retaining their relative associations with each other even when their positions 
are altered by cuing allow the order of integers to be constructed in linear time in-place. 

From complexity point of view, associative sort shows similar characteristics with 
bucket sort and distribution counting sort. Hence, it can be thought of as in-place asso- 
ciative bucket sort or in-place associative distribution counting sort. Distribution counting 
sort is seldom discussed in the literature although it has been around more than 50 years 
since proposed by Seward [8] in 1954 and by Feurzig [9] in 1960, independently, and known 
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to be the method that makes radix sort possible on digital computer. It is known to be 
very powerful when the integers have small range. Given n integers /1[0 . . . n — 1] each in 
the range [0, m — 1], its time-complexity is 0{n + m) and requires n + m additional space 
for a stable and m for an unstable sort. Hence, distribution counting sort becomes effi- 
cient and practical when m = 0{n) defining its time-space trade-offs. On the other hand, 
bucket sort is a generalization of distribution counting sort. In fact, if each bucket has 
size 1, then bucket sort degenerates to distribution counting sort. However, the variable 
bucket size allows it to use 0{n) memory instead of 0{m + n) memory. Its average case 
time complexity is 0{n + m) and if m = 0{n), then it becomes 0{n). Its worst case time 
complexity is (9(n^). 

With this introductory information, the contribution of this study is, 

A practical sorting algorithm that sorts n integers A[0 . . .n — 1] each in the range 
[0, m — 1] using 0{1) extra space in 0{n+m) time for the worst, 0{m) time for the average 
(uniformly distributed integers) and 0{n) time for the best case. The ratio ^ defines the 
efficiency (time-space trade-offs) of the algorithm letting very large arrays to be sorted in- 
place. The algorithm is simple and practical replacing bucket sort, distribution counting 
sort and address calculation sort family of algorithms improving the space requirement 
to only 0{1) extra words. 

Practical comparisons for 1 million 32 bit integers with quick sort showed that associative 
sort is roughly 2 times faster for uniformly distributed integers when m = n. When 
— = 10 performances are same. When — = TF^ associative sort becomes more than 3 

n n 10 

times faster than quick sort. If the distribution is exponential, associative sort shows 
better performance up to ^ ~ 25 when compared with quick sort. 

Practical comparisons for 1 million 32 bit integers showed that radix sort is 2 times faster 
for uniformly distributed integers when m = n. However, associative sort is slightly better 
than radix sort when — = -it;. Further decreasing the ratio to — = -r^, associative sort 

n 10 ° n 100 ' 

becomes more than 2 times faster than radix sort. 

Practical comparisons for 1 million 32 bit integers showed that value-sorting version of 
distribution counting sort (frequency counting sort [16]) is 2 times faster than associative 
sort for ^ = 1. Distribution counting sort is still slightly better but the performances get 
closer when — < 4t and — > 10. 
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Even omitting its space efficiency for a moment, associative sort asymptotically outper- 
forms all content based sorting algorithms when n is large relative to m. 

The technique seems to be efficient and applicable for other problems, as well, such 
as hashing, searching, element distinction, succinct data structures, gaining space, etc. 
For instance, there are several space gaining techniques available and widely used in the 
literature for in-place and minimum space algorithms [221 - I25] . However, as known to 
the author, all these in-place and minimum space algorithms have a dedicated explicit 
technique that is used only for space gaining purpose. On the contrary, gaining space 
is an inherent step of associative sort which improves its performance and can be used 
explicitly. 

3 Basics of Associative Sort 

Given n distinct integers A[0 . . . n — 1] each in the range [5, 5 + m — 1] where 5 = mm{A), 
if m = n and A is the sorted permutation, then there is a bijective relation i = A[i] — 6 
between each integer and its position. From contradiction, i ^ A[i] — 6 implies that the 
integer A[i] is not at its exact position. Its exact position can be calculated by j = A[i] —6. 
Therefore, the simple monotone bijective hash function j = A[i]—6 that maps the integers 
to j G [0,n — 1] can sort the array in 0{n) time using 0{1) constant space. This is in-situ 
permutation (cycle leader permutation) where A is re-arranged by following the cycles of 
a permutation vr. First A[0] is sent to its final position 7r(0) (calculated by j = A[i] — 6). 
Then the element that was in 7r(0) is sent to its final position 7r(7r(0)). The process 
proceeds in this way until the cycle is closed, that is until the integer addressing the first 
position is found which means that the association = A [0] — 5 is constructed between 
the integer and its position. Then the iterator is increased to continue with A[l]. At the 
end, when all the cycles of A[i] for i = 0, l..,n — 1 are processed, all the integers are in 
their exact position and the association i = A[i] — 6 is constructed between the integers 
and their position resulting in the sorted permutation. 

If we look at this approach closer, we can interpret the technique entirely different. 
That is, we are indeed creating an ILS lm[0 ... n — 1] over A[0 . . .n — 1] where the relative 
basis of this ILS coincides with that of the array space in the physical memory. The 
ILS spans a predefined interval of the range of integers depending on n. Since m = n. 



9 



it spans the entire range of integers. The association between the array space and the 
ILS is created by a node using the monotone bijective hash function i = A[i] — 6 that 
maps a particular integer to the ILS. When a node is created for a particular integer, 
the redundancy due to the association between the integer and the position of the node 
releases the word allocated for the integer in the physical memory. Hence, we can clear 
the node {A[i] = 0) and set its tag bit, for instance its most significant bit (MSB) to 
discriminate it as a node, and use the remaining w — 1 bits of the node for any other 
purpose. When we want the integer back to array space from ILS, we can use the inverse 
of hash function and get the integer back by = i + 6 to the array space. However, we 
don't use free bits of a node for other purposes in this case because it is known that all the 
integers are distinct and hence only one integer will be practiced at a location creating a 
node. Therefore, instead of tagging the word as node using its MSB, we use the integer 
itself to tag the word "implicitly" as node, since if an integer is mapped to the ILS, then 
it will always satisfy the hash function i = A[i] — 6. Hence, the integers are "implicitly 
practiced" in this case. 

Mathematically, consider an array of n distinct integers A[0 . . . n — 1] each in the range 
[0,U] and stored sequentially in the RAM. Let U denote the field of positive integers 
including and consider the elements of the array as a set of 3-tuples x = [i, A[i], 1] 
of integers forming a vector space over U denoted by U^. Hence, A [0 ... — 1] is a 3- 
dimensional vector space over U and any element of the array is represented by 3 integer 
components where the first one in [0,n — 1] represents the index, i.e., the position of the 
element in the array, the second one in [0,U] represents the element (either an integer or 
a node) stored at that position, and the third one is a dummy constant. 

Now, consider an ILS lm[0 . . .n — 1] which is a vector space over a given interval of 
range of integers of A as a set of 3-tuples x = [i', A[i'], 1] of integers. This time, the 
second component is a subset of integers of A in the range [a, b] with h — a <n. It should 
be noted that, for an array of n integers each in the range [5, 5 + m — 1] where 5 = min(S'), 
if m = n, i.e., the number of integers is equal to the range of integers, then the ILS spans 
the entire range of integers. 
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A bijective linear mapping from the array space to the ILS can be defined as, 
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with 6 = min(y4). Eqn. 13.11 tells us that when an integer is mapped into the ILS, its 
new position will be i' = A[i] — 6 in both the ILS and the array space while its value 
is unchanged, i.e., A[i'] = A[i]. From algorithm point of view, this is equivalent to 
swapping A[i] with A[i']. The linear mapping in Eqn J3.1l does not have an inverse. This is 
expectable when we consider that after swapping A[i] with A[i'] there is no way to know 
where A[i'] was before. However, when we look at the right side of this equation closer, 
we immediately see the redundancy between the new position of an integer {A[i] — 6) 
and its value {A[i]) provided that 6 is known. This redundancy is the fact that makes 
cycle leader permutation possible. Therefore, cycle leader permutation is a special case 
of associative sort. 

This mathematical definition gives us other opportunities. For instance, we can define 
our transformation matrix as. 
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which says that w bits of an integer that is mapped into the ILS at i' can be cleared and 
used for any other purpose because it can be retrieved back to any location (for instance 
to i) by. 
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provided that 5 = min(y4) is known. 

Imaginary linear subspace can be defined anywhere on the array space A[0 . . .n — 1] 
provided that its boundaries does not cross over that of the array space. An ILS can be 
defined as Im[a . . .b] with b — a < n. Two supplementary definitions should be given: 

(i) The relative basis of the subspace over the array space. This is defined by the 
statement Im[a ... 6] over y4[u . . . i;] which strictly implies that v — u = b — a. 
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(ii) The interval of range of integers spanned by Jm[a ... 6]. But, it is immediate that 
Im[a . . .b] can span any subset Ai C A where each integer of Ai is in [/3, /3 + b — a] 
with /3 e A. 

Furthermore, an ILS of a small interval that is created casually anywhere on the array 
space can be moved with all its nodes to a given direction (left or right) with respect 
to the array space provided that the hash function that associates the subspace and the 
array space is shifted as much as the subspace is shifted relative to the array space. This 
is "subspace shifting". 

Node is an association (interconnection) between the ILS and the array space. It is 
created by mapping an integer from the array space into the ILS during practicing. A 
monotone bijective hash function is used for the mapping. The necessary requirements 
that a particular integer of the array space can be mapped into an ILS creating a node 
are, 

(i) The integer should be in the interval of the range of integers spanned by the ILS. 

(ii) There should not be a node already created by another occurrence of that particular 
integer. 

4 Sorting n Integers 

In this section, the associative sorting technique will be introduced which is based on the 
three basic steps: (i) practicing, (ii) storing and (iii) retrieval. 

Using w — 1 bits (record) of a node released while a particular integer is being practiced, 
other occurrences of that particular integer can be counted. Unfortunately, we need log n 
bits of the record to encode the absolute position of the node during storing. Hence, it is 
reasonable to doubt that we can count up to 2"'~^~'°s" including the first occurrence that 
created the node. Fortunately, this is not the case. We can count using all w — 1 bits of 
the record, and while the nodes are being stored at the beginning of the array (short-term 
memory), we can get an idle integer immediately after a node that has practiced at least 
2w-i-iogn ini^gggj-g ajifj write the absolute position of that node over the idle integer as the 
cue. In this case, the record of the node stores only the number of idle integers practiced 
by that node. But this time, we encounter another serious problem of all distributive 
sorting algorithms. Depending on the distribution of the integers, the nodes that are 
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created during practicing are dispersed over the ILS. If the nodes are clustered to the 
beginning, how an idle integer can be inserted immediately after a particular node if 
there is another node immediately before and after that particular node during storing? 
The answer is in the pigeonhole principle. Pigeonhole principle says that, 

Corollary 4.1. Given n integers /1[0 ... n — 1], the maximum number of distinct integers 
that may occur contemporary in A at least 2"'~i~^°s" times is, 

\ (4.1) 

Hence, if the size of the array is say n = 2"^'^, the maximum number of distinct integers 
that may occur contemporary in A at least 1 time is n. But the node itself represents the 
first occurrence which creates it. Therefore, 

Corollary 4.2. The maximum number of nodes that each can practice at least 2^~^~^°^^ 
integers and hence need an idle integer immediately after itself during storing is equal to, 

e = \ ri — 1 4.2 

I 2i«— 1— log" ' 

and upper bounded by n/2. 
This means that. 

Corollary 4.3. // the integers are practiced to Im[e,n — 1] over A[e,n — 1] where e is 
calculated by Eqn \4.S{ then there will be e integers at the beginning of the array either idle 
or out of the practiced interval which will prevent collisions while inserting idle integers 
immediately after the nodes that practiced at least 2"'"^~^°§" integers. 

Hence, 

Lemma 4.4. Given n <= 2"'^^ integers A[0...n — 1] each in the range [0,2"""^ — 1], all 
the integers in the range [6,6 + n — e — 1] where 6 = min{A) can be sorted associatively at 
the beginning of the array in 0{n) time using 0{1) constant space. 

Proof. Given n <= 2"'^^ distinct integers /l[0...n — 1] each in the range [0, 2'^^^ — 1]? it is 
not possible to construct a monotone bijective hash function that maps all the integers of 
the array into j G [0,n — 1] without additional storage space ^7\. However, a monotone 
bijective hash function can be constructed as a partial function [2B] that assigns each 
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integer oi Ai G A in the range [S,S + n — e — 1] with S = min(A) to exactly one element 
in j G [e,n — 1]. The partial monotone bijective hash function of this form is, 

j=A[i]-6 + e if A[7]-6 + e<n (4.3) 
With this definition, the proof has three basic steps of associative sort: 

(i) Practice all the integers of the interval [5,5 + n — e — 1] into /m[e . . . tt, — 1] over 
A[e...n- 1]. 

(ii) Store the nodes at the beginning of the array (short term memory) in order. If at 
least 2"'~^~'°s'^ idle integers are practiced by a particular node, find the nearest idle 
integer searching to the right and move it immediately after that node and write 
the absolute position of the node over the idle integer by modifying it. Otherwise, 
i.e., if less than 2"'~^~'°s" idle integers are practiced by a particular node, encode 
the absolute position of the node into logn bits of its record where the remaining 
w — 1 — log n bits store the number of idle integers. 

(iii) Retrieve the encoded information from the short term memory processing the records 
backwards to construct the sorted permutation of the practiced interval. If MSB of 
a record is 1, then it is a node and its record stores both the absolute position of 
that node and the number of idle integers practiced by that node. Otherwise, i.e., 
if MSB of a record is 0, then it is indeed an idle integer brought immediately after 
a node that has practiced at least 2"'~^"^°s" idle integers. Hence, read the absolute 
position of the node from the idle integer and decode the number of idle integers 
from the record of the node (predecessor of idle integer). 

□ 

4.1 Practicing Phase 

Practicing is the process of encoding the necessary information required to recollect the 
sorted permutation of the practiced, i.e., received, processed and combined interval. An 
individual iteration over the array can practice only the distinct integers disregarding 
other occurrences in the predefined interval of the ILS creating nodes exactly equal to 
the number of distinct integers in that interval. Such an iteration is "practicing distinct 
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integers of an interval". On the other hand, an iteration can practice all the integers 
of the array that are in the predefined interval of the ILS. For instance, if the number 
of distinct integers in a given interval that create a node is and the number of total 
occurrences other than those particular distinct integers is Uc, then Ud nodes are created 
practicing (counting) other Uc integers that become idle, hence become meaningless. Such 
an iteration is "practicing all the integers of an interval" and one can inquiry the existence 
and number of occurrences of a given value in that interval in 0{1) time after practicing. 

Algorithm A. Practice all the integers of the interval [6,6 + n — e — 1] into the ILS 



Im[e ... n — 1] over A[e . . .n — 1] using Eqn. 14.31 It is assumed that the minimum of the 
array 6 = min{A) is known and e is calculated by Eqn. 14.21 

Al. initialize i = 0; 

A2. if A[i] < 6, then A[i] is an idle integer of an interval that has already been sorted. 
Hence, increase i by one and repeat this step; 

A3, if MSB of A[i] is 1, then A[i] is a node. Hence, increase i by one and goto step IA2[ 

A4. if A[i] — 6 + e > n, then A[i] is an integer that is out of the practiced interval. Hence, 
increase by one that counts the number of integers out of the practiced interval, 
update 6' = min{6', A[i]), increase i by one and goto to step IA2[ 

A5. otherwise, A[i] is an integer to be practiced. Hence, calculate j = A[i] —6 + e; 

A6. if MSB of A[j] is 0, then A[i] is the first integer that will create the node at j. Move 
A[j] to A[i], clear A[j] and set its MSB to 1 making it a node, li j < i increase i 
by one. Increase rid by one that counts the number of distinct integers (nodes), and 
goto step IA2t 

A7. otherwise, A[j] is a node that has already been created. Hence, clear MSB of A[j], 
increase A[j] by one (number of idle integers) and set its MSB back to 1. Increase 
both i and ric by one {uc counts the number of total idle integers over all distinct 
integers) and goto step IA2t 

Remark 4.1. It should be noted that, all the three phases of associative integer sorting 
can be implemented using the underlying signed integer notation of computers. As long 
as the number of negative integers are one more than positive integers, —1 can be used to 
tag a word as node. Afterwards, the node can be decreased by one for each practiced idle 
key. In such a case, there is no need to struggle with bitwise operations. For instance, if 



we consider Algorithm A, it becomes, 
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Al'. initialize i = 0; 

A2'. if A[i] < 6, increase i by one and repeat this step; 

A3', if A[i] — 6 + e > n, increase n'^ by one, update 6' = min{6', A[i]), increase i by one 
and goto IA2'| 

A4'. otherwise, A[i] is an integer to be practiced. Hence, calculate j = A[i] — 6 + e; 
A5'. if A[j] >= 0, move A[j] to A[i], and set A[j] = —1 making it a node, li j < i 
increase i by one. Increase Ud by one, and goto step IA2'( 



A6'. otherwise, decrease A[j] by one. Increase both i and ric by one and goto step IA21 

Associative Range Queries. Instead of practicing all the integers in an interval as 
in [Algorithm A one can practice only the distinct integers in an interval, for instance 



[6,6 + n/2 — 1] into Im[n/2 ... n — 1] over A[n/2 . . .n — 1], writing each created node's 
position to the previously created node's record. Hence, an associative linked array is 
obtained in 0{n) time that can answer range queries such as: "Which are the integers in 
the range [6, 6+n/2 — l]7" in 0{nd) time. Furthermore, if a secondary ILS lm[0 . . . n/2 — 1] 
is created which does not overlap with the primary one, then all the idle integers of the 
same interval can be practiced and mapped to the secondary ILS. This constructs a further 
association between the subspaces through matching node positions with respect to each 
subspace basis. As a result, while answering range queries using the primary subspace, the 
number of integers can be queried with 0{1) secondary subspace access. When finished, 
all the integers mapped to the imaginary subspaces can be retrieved back and the queries 
can continue with another interval of interest. 



4.2 Storing Phase 

Storing is the process of creating permanent records in the short-term memory in a system- 
atic and organized way where the received, processed and combined information during 
practicing is encoded into the records of the nodes to cue the recall of the necessary in- 
formation that will be used to construct the sorted permutation of the practiced interval. 

Practicing creates nodes and idle integers. This means distinct integers of the 
practiced interval are mapped into the ILS creating nodes that are dispersed with relative 
order in Im[e ... n — 1] over A[e . . .n — 1] depending on the statistical distribution of the 
integers. On the other hand, Uc idle integers are distributed disorderly together with n'^ 
integers out of the practiced interval in the array space. 
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In storing phase, the nodes are clustered in a systematic way, i.e., the gaps between 
the nodes of the ILS are closed to a direction without altering their relative order with 
respect to each other. When the nodes are moved towards a direction, it is not possible 
to retain the association between the nodes and the integers. Furthermore, we cannot 
freely use logn bits of a record of the node to encode its absolute position because the 
node has practiced other occurrences (idle integers) and if the number of the idle integers 
occupies more than w — 1 — log n bits, it would not be possible to encode the absolute 
position of the node into the record. But, the maximum number of nodes that will need 
an idle integer immediately after itself during storing is equal to, 

e = \^^] (4.4) 

I log"- 

Hence, by mapping the integers to Im[e ... n — 1] over A[e ... n — 1], we create a recovery 
area exactly equal to e which prevents collisions during storing and lets us to bring an 
idle integer immediately after a particular node that has practiced at least 2^""^"^°^" idle 
integers. 

Algorithm B. Store the encoded information of the practiced interval in the short term 
memory. If at least 2"'^^"'°^" idle integers are practiced by a particular node, find the 
nearest idle integer on the right side searching forward and move it immediately after 
that particular node and write the absolute position of the node over the idle integer by 
modifying it. Hence, the absolute position of the node can be recalled from the idle integer 
during retrieval. In such a case, the record of the node preceding the idle integer in the 
short term memory only stores the number of idle integers. Otherwise, i.e., if less than 
2«)-i-iogn j^jg integers are practiced by a particular node, encode the absolute position of 
the node into logn free bits of the record together with the bits (other than the tag bit) 
that keep the number of idle integers. 

Bl. initialize i = e, j = 0, k = rid, and e' = which will count the exact number of nodes 

that have practiced at least 2"'~^~^°s" j^jig integers; 
B2. if MSB of A[i] is 0, then A[i] is either an idle integer or an integer that is out of the 

practiced interval. Hence, increase i and repeat this step; 
B3. otherwise, A[i] is a node. Hence, get the number of practiced idle integers into s; 
B4. if s + 1 < 2^~^~^°^^, then encode the absolute position i (logn bits) of the node into 

s, move A\j] to A[i], and write s to A[j]. Increase i and j and decrease k. If /c = 
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exit, otherwise goto step IB2| 
B5. otherwise, i.e., if s + 1 > 2"'^^~'°s"', then swap A[i] with A[j]. Find one of Uc idle 
integers on the right starting a search from A[p] where p is either equal to j ' + 1 if 
this is the first time that a search is started or equal to the last idle integer position 
found in the previous search. Move A[j + 1] to this safe location and write 

the absolute position i of the node over A[j + 1]. Increase i and e' by one and j by 
two and decrease k by one. If k = exit, otherwise goto step IB2| 

4.3 Retrieval Phase 

Retrieval is the reverse of storing. The sorted permutation of the practiced interval is 
constructed using the stored information in the short-term memory. On the other hand, 
getting the integer from the ILS back to the array space is integer retrieval where the 
positional information stored in the record of a node cues the recall of the integer using 
the inverse hash function. 

Storing clusters nodes and e' idle integers at A[0 . . . ria + e' — 1] with the necessary 
information required to construct the sorted permutation of the practiced interval. Hence, 
/1[0 . . . n^ + e' — 1] can be though of as a short term memory where the encoded information 
of the practiced interval is stored. On the other hand, Uc — e' idle integers and n'^ integers 
out of the practiced interval are distributed disorderly together at A[nii + e' . . . n — 1]. 

In retrieval phase, the stored information is retrieved from the short term memory 
A[0 . . .Ud + e' — 1] to construct the sorted permutation of the practiced interval. The short 
term memory encodes + Uc integers with ra^ + e' permanent records. It is important to 
note that, if the number of occurrences of a particular integer is Ui, then there are — 1 
idle integers in the array. But the node itself represents the integer that is mapped into 
the ILS through itself. Hence, it is immediate from this definition that the nodes in the 
short term memory A[0 . . . Ud + e' — 1] can be processed from right to left backwards and 
the integers practiced by each node can be expanded over A[0 . . . Ud + ric — 1] sequentially 
right to left backwards without collision. At this point, we have two options: sequential 
or recursive version. But before proceeding, the retrieval phase will be introduced; 

Algorithm C. Retrieve the encoded information from Ud + e' records of the short term 
memory yl[0 . . . + e' — 1] to construct sorted permutation of Ud + ric integers of the 
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practiced interval. Process the records from right to left backwards and expand the 
integers over A[0 . . . rid, + rif. — I] sequentially right to left backwards. 



CI. initialize i = + e' — 1 and p = Ud + Uc — 1; 
C2. check MSB oi A[i]; 

(i) if MSB of A[i] is 1, then it is a node. Hence, decode from the record of the 
node the number of idle integers practiced by the node to k (does not include 
the integer that create the node) and absolute position of the node to j and 
decrease i by one; 

(ii) otherwise, A[i] is an idle integer brought immediately after a node. Hence, 
get the absolute position of the node from the idle integer to j and get the 
number of idle integers practiced by the node from its record at A[i — 1] to k 
(does not include the integer that create the node) and decrease i by two; 

C3. retrieve the integer from the ILS: absolute position j of the node cues the recall of 
the integer using the inverse hash function. Then copy the integer to A\p — k .. .p], 
decrease p hj k + 1 and goto step IC2t 



Sequential Version After storing the encoded information into the short term memory, 
Uc — e' idle integers and integers out of the practiced interval are distributed disorderly 
together at A[nd + e' . . . n — 1]. If we partition A[nd + e' . . .n — 1] selecting the pivot 
equal to 6, then idle integers are clustered after the short term memory. Therefore, 
Algorithm C| can immediately be used to retrieve. Hence, the structure of the sequential 



version becomes; 

Algorithm D. In each iteration, construct sorted permutation of na + ric integers of the 
practiced interval at the beginning of the array. 

Dl. find min(y4) and max{A)] 

D2. initialize e using Eqn. 14. 2[ S = min(A), S' = max(A) and reset counters; 



D3. practice all the integers in the interval [6,6 + n — e — 1] using Algorithm A 



D4. store encoded information of the practiced interval using Algorithm B 



D5. in-place partition A[nd + e' . . .n — 1] clustering Uc — e' idle integers at the beginning; 



D6. retrieve the sorted permutation of the practiced interval using Algorithm C 
D7. if n'^ = exit. Otherwise set A = A[nd + ric . . .n — 1], n = n'^, 6 = 6', 6' = max(74), 
reset counters, calculate e using Eqn. 14.21 and goto step ID3I 
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Remark 4.2. min(A) and max(y4) need not be found in step IDll Instead, if 5 = 
and 5' = max(U) the algorithm sorts the integers in the range [0,?t, — e — 1] during 
the first iteration (or recursion). However, if there is not any integer in this interval. 
Algorithm A finds 6' = min{A) in step ID3I in 0{n) time, and continues with the integers 



in [6',6' + n-e-l]. 

Remark 4.3. Sequential version of associative sort technique is on-line in the sense that 
after each step ID6[ + integers are added to the sorted permutation at the beginning 
of the array and ready to be used. 

A Different Approach. Instead of using Eqn. 14.21 to calculate the maximum value of 
e, there is another approach to solve the same problem possibly more efficiently; 

(i) practice all the integers in the interval [6, 6+n—l] by mapping them into lm[0 . . . fi- 
ll over A[0...n — 1]. However, during practicing, if the number of idle integers 
practiced by a particular node reaches exactly 2"'"^"^°^", then increase e' which 
counts the exact number of nodes that has practiced at least 2"'~^~'°s"- idle integers; 

(ii) retrieve the integers in A[n — e' . . .n — 1] back to the array space; 

(iii) shift subspace to the right by e'; 



(iv) store encoded information of the practiced interval using Algorithm B 

(v) partition A[nd + e' . . . n — 1] clustering Uc — e' idle integers to the beginning; 

(vi) retrieve the sorted permutation of the practiced interval using Algorithm C 



(vii) if = exit. Otherwise set A = A[nd + ric . . . n — 1], n = n'^, 6 = 6', 6' = max(y4), 
reset counters and goto step 



In this case, instead of using the maximum value of e, its exact value e' is counted and 
used which may improve the overall efficiency. 



Recursive Version Saving n^, e' and S in stack space, we can recursively call Algorithm A 



and Algorithm B Although the exact number of integers to be sorted in the next level of 
recursion is n'^, the overall number of integers in that recursion is n = n'^ + — e' where 
Uc — e' of them are idle integers of the previous recursion and meaningless. However, these 
idle integers increase the interval of range of integers spanned by the ILS improving the 
overall time complexity in each level of recursion. The recursion can continue until no 
any integer exists. In the last recursion, retrieval phase can begin to construct the sorted 
permutation of Ud + Uc integers from Ud + e' records stored in the short term memory 
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S'[0 . . . + e' — 1] of that recursion and expand over 5* [0 ... n — 1] right to left backwards. 
Each level of recursion should return the total number of integers copied on the array 
to the higher level to let it know where it will start to expand its interval. It should 
be noticed that, in the recursive version of the technique, there is no need to partition 
Uc — e' idle integers from n'^ unpracticed integers. Hence, one step is canceled improving 
the overall efficiency. 

Complexity of the algorithm depends on the range and the number of integers. In each 
iteration (or recursion) the algorithm is capable of sorting integers that satisfy (5+e < 
n where e is defined by Eqn. 14.21 and upper bounded by n/2. Hence, at worst case 
(n = 2^""^), the integers that satisfy A[i\ — 5 < n/2 are sorted in the first pass. This 
means that, given uniformly distributed n = 2^~^ integers A[0 ... n — 1] each in the range 
[0,n — 1], the complexity is the recursion T{n) = T(^) + 0{n) yielding T{n) = 0{n). 

Best Case Complexity. Given n integers A[0 . . . n — 1], if n — 1 integers satisfy y4[i] — 5 < 
n/2, then these are sorted in 0{n) time. In the next step, there is one integer left which 
implies sorting is finished. As a result, time complexity of the algorithm is lower bounded 
by Q{n) in the best case. 

Worst Case Complexity. Given n integers A[0 . . .n — 1] and m = /3n, if there is only 
1 integer available in practiced interval at each iteration (or recursion) until the last, in 
any jth step, the only integer s that will be sorted satisfies s < "'""2 which implies 
that the last alone integer satisfies s < ^"■^[^^^'^ < j3n from where we can calculate j by 
j < MlL^ In this case, the time complexity of the algorithm is. 



0{n) + 0{n-l) + ... + 0{n- 3) = {j + l)O(n) - Oif) < (2/3 + l)C(n) (4.5) 



Therefore, the algorithm is upper bonded by (2/3 + l)0{n) = 0{2m + n) in worst case. 
Average Case Complexity. Given n integers A [0 . . . — 1] , if m = /3n and the integers 



algorithm is capable of sorting ^ integers in 0{n) time during first pass. This will 
continue until all the integers are sorted. The sum of sorted integers in each iteration can 
be represented with the series. 



are uniformly distributed, this means that ^ integers satisfy A[i\ < ^. Therefore, the 



2/3 4/32 8/33 




+ . . . + 



n(2/3- 1) 
2^/3^ 



+ . . . 



(4.6) 
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It is reasonable to think that the sorting ends when one term is left which means the sum 
of k terms of this series is equal to n — 1, from where we can calculate the number of 
iteration or dept of recursion k which is valid when /9 > |, 

1 ^ (2^-1)--^ 

n (2/3)^ ^ ' 

It is seen from Eqn. 14.71 that when m = n, i.e., /3 = 1, number of iteration or dept of 
recursion becomes k = logn. It is known that each step takes 0{n) time. Therefore, the 
time complexity of the algorithm is, 

,,(2/3-1) (2/3-1)2 (2P-l)^-\ , 

+ + + (4.8) 



from where we can obtain by defining x = -^^f^, 

1 k—l 

0(n) (l + x + x'^ + + ■■■ + x''^^) = 0(n)( ) < 2/30(n) (4.9) 

1 — X 1 — X 

which means that the algorithm is upper bounded by 2/30(n) or 20{m) in the average 
case. 



5 Conclusions 

In this study, in-place associative integer sorting technique is introduced. Using the 
technique, the main difficulties of distributive sorting algorithms are solved by its inherent 
three basic steps namely (i) practicing, (ii) storing and (iii) retrieval which are three main 
stages in the formation and retrieval of memory in cognitive neuroscience. The technique 
is very simple and straightforward and around 30 lines of C code is enough. 

The technique sorts the integers using 0{1) extra space in 0{n + m) time for the 
worst, 0{m) time for the average (uniformly distributed integers) and 0{n) time for the 
best case. It shows similar characteristics with bucket sort and distribution counting sort 
and hence can be thought of as in-place associative bucket sort or in-place associative 
distribution counting sort. However, it is time-space efficient than both. The ratio ^ 
defines the efficiency (time-space trade-offs) letting very large arrays to be sorted in-place. 
Furthermore, the dependency of the efficiency on the distribution of the integers is 0{n) 
which means it replaces all the methods based on address calculation, that are known to 
be very efficient when the integers have known (usually uniform) distribution and require 
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additional space more or less proportional to n. Hence, associative sort asymptotically 
outperforms all content based sorting algorithms when ^ = c and c is the efficiency 
constant determined by the other sorting algorithms regardless of how large is the array. 

The technique seems to be very flexible, efficient and applicable for other problems, 
as well, such as membership and range queries, hashing, searching, element distinction, 
succinct data structures, gaining space, etc. For instance, gaining space is an inherent 
step of associative sort which improves its performance and can be used explicitly, as well. 

The drawbacks of the algorithm is that it is unstable as well as suitable for value- 
sorting. But, an ILS can create other subspaces and associations using the idle integers 
that were already practiced by manipulating either their position or value or both. Hence, 
different techniques can be developed to solve such problems. 
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